51 research outputs found

    Physiological Indicators for User Trust in Machine Learning with Influence Enhanced Fact-Checking

    Full text link
    © IFIP International Federation for Information Processing 2019. Trustworthy Machine Learning (ML) is one of significant challenges of “black-box” ML for its wide impact on practical applications. This paper investigates the effects of presentation of influence of training data points on machine learning predictions to boost user trust. A framework of fact-checking for boosting user trust is proposed in a predictive decision making scenario to allow users to interactively check the training data points with different influences on the prediction by using parallel coordinates based visualization. This work also investigates the feasibility of physiological signals such as Galvanic Skin Response (GSR) and Blood Volume Pulse (BVP) as indicators for user trust in predictive decision making. A user study found that the presentation of influences of training data points significantly increases the user trust in predictions, but only for training data points with higher influence values under the high model performance condition, where users can justify their actions with more similar facts to the testing data point. The physiological signal analysis showed that GSR and BVP features correlate to user trust under different influence and model performance conditions. These findings suggest that physiological indicators can be integrated into the user interface of AI applications to automatically communicate user trust variations in predictive decision making

    Spatially Uniform ReliefF (SURF) for computationally-efficient filtering of gene-gene interactions

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome-wide association studies are becoming the de facto standard in the genetic analysis of common human diseases. Given the complexity and robustness of biological networks such diseases are unlikely to be the result of single points of failure but instead likely arise from the joint failure of two or more interacting components. The hope in genome-wide screens is that these points of failure can be linked to single nucleotide polymorphisms (SNPs) which confer disease susceptibility. Detecting interacting variants that lead to disease in the absence of single-gene effects is difficult however, and methods to exhaustively analyze sets of these variants for interactions are combinatorial in nature thus making them computationally infeasible. Efficient algorithms which can detect interacting SNPs are needed. ReliefF is one such promising algorithm, although it has low success rate for noisy datasets when the interaction effect is small. ReliefF has been paired with an iterative approach, Tuned ReliefF (TuRF), which improves the estimation of weights in noisy data but does not fundamentally change the underlying ReliefF algorithm. To improve the sensitivity of studies using these methods to detect small effects we introduce Spatially Uniform ReliefF (SURF).</p> <p>Results</p> <p>SURF's ability to detect interactions in this domain is significantly greater than that of ReliefF. Similarly SURF, in combination with the TuRF strategy significantly outperforms TuRF alone for SNP selection under an epistasis model. It is important to note that this success rate increase does not require an increase in algorithmic complexity and allows for increased success rate, even with the removal of a nuisance parameter from the algorithm.</p> <p>Conclusion</p> <p>Researchers performing genetic association studies and aiming to discover gene-gene interactions associated with increased disease susceptibility should use SURF in place of ReliefF. For instance, SURF should be used instead of ReliefF to filter a dataset before an exhaustive MDR analysis. This change increases the ability of a study to detect gene-gene interactions. The SURF algorithm is implemented in the open source Multifactor Dimensionality Reduction (MDR) software package available from <url>http://www.epistasis.org</url>.</p

    Spatiotemporal patterns of population in mainland China, 1990 to 2010

    Get PDF
    According to UN forecasts, global population will increase to over 8 billion by 2025, with much of this anticipated population growth expected in urban areas. In China, the scale of urbanization has, and continues to be, unprecedented in terms of magnitude and rate of change. Since the late 1970s, the percentage of Chinese living in urban areas increased from ~18% to over 50%. To quantify these patterns spatially we use time-invariant or temporally-explicit data, including census data for 1990, 2000, and 2010 in an ensemble prediction model. Resulting multi-temporal, gridded population datasets are unique in terms of granularity and extent, providing fine-scale (~100 m) patterns of population distribution for mainland China. For consistency purposes, the Tibet Autonomous Region, Taiwan, and the islands in the South China Sea were excluded. The statistical model and considerations for temporally comparable maps are described, along with the resulting datasets. Final, mainland China population maps for 1990, 2000, and 2010 are freely available as products from the WorldPop Project website and the WorldPop Dataverse Repository

    A Practical Platform for Blood Biomarker Study by Using Global Gene Expression Profiling of Peripheral Whole Blood

    Get PDF
    Background: Although microarray technology has become the most common method for studying global gene expression, a plethora of technical factors across the experiment contribute to the variable of genome gene expression profiling using peripheral whole blood. A practical platform needs to be established in order to obtain reliable and reproducible data to meet clinical requirements for biomarker study. Methods and Findings: We applied peripheral whole blood samples with globin reduction and performed genome-wide transcriptome analysis using Illumina BeadChips. Real-time PCR was subsequently used to evaluate the quality of array data and elucidate the mode in which hemoglobin interferes in gene expression profiling. We demonstrated that, when applied in the context of standard microarray processing procedures, globin reduction results in a consistent and significant increase in the quality of beadarray data. When compared to their pre-globin reduction counterparts, post-globin reduction samples show improved detection statistics, lowered variance and increased sensitivity. More importantly, gender gene separation is remarkably clearer in post-globin reduction samples than in pre-globin reduction samples. Our study suggests that the poor data obtained from pre-globin reduction samples is the result of the high concentration of hemoglobin derived from red blood cells either interfering with target mRNA binding or giving the pseudo binding background signal. Conclusion: We therefore recommend the combination of performing globin mRNA reduction in peripheral whole blood samples and hybridizing on Illumina BeadChips as the practical approach for biomarker study

    Sequence dependence of isothermal DNA amplification via EXPAR

    Get PDF
    Isothermal nucleic acid amplification is becoming increasingly important for molecular diagnostics. Therefore, new computational tools are needed to facilitate assay design. In the isothermal EXPonential Amplification Reaction (EXPAR), template sequences with similar thermodynamic characteristics perform very differently. To understand what causes this variability, we characterized the performance of 384 template sequences, and used this data to develop two computational methods to predict EXPAR template performance based on sequence: a position weight matrix approach with support vector machine classifier, and RELIEF attribute evaluation with Naïve Bayes classification. The methods identified well and poorly performing EXPAR templates with 67–70% sensitivity and 77–80% specificity. We combined these methods into a computational tool that can accelerate new assay design by ruling out likely poor performers. Furthermore, our data suggest that variability in template performance is linked to specific sequence motifs. Cytidine, a pyrimidine base, is over-represented in certain positions of well-performing templates. Guanosine and adenosine, both purine bases, are over-represented in similar regions of poorly performing templates, frequently as GA or AG dimers. Since polymerases have a higher affinity for purine oligonucleotides, polymerase binding to GA-rich regions of a single-stranded DNA template may promote non-specific amplification in EXPAR and other nucleic acid amplification reactions

    Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences.</p> <p>Results</p> <p>The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes.</p> <p>Conclusions</p> <p>The improved predictions stem from the novel features that express collocation of the secondary structure segments in the protein sequence and that combine evolutionary and secondary structure information. Our work demonstrates that conservation and arrangement of the secondary structure segments predicted along the protein chain can successfully predict structural classes which are defined based on the spatial arrangement of the secondary structures. A web server is available at <url>http://biomine.ece.ualberta.ca/MODAS/</url>.</p

    Familial hypercholesterolaemia in children and adolescents from 48 countries: a cross-sectional study

    Get PDF
    Background Approximately 450 000 children are born with familial hypercholesterolaemia worldwide every year, yet only 2·1% of adults with familial hypercholesterolaemia were diagnosed before age 18 years via current diagnostic approaches, which are derived from observations in adults. We aimed to characterise children and adolescents with heterozygous familial hypercholesterolaemia (HeFH) and understand current approaches to the identification and management of familial hypercholesterolaemia to inform future public health strategies. Methods For this cross-sectional study, we assessed children and adolescents younger than 18 years with a clinical or genetic diagnosis of HeFH at the time of entry into the Familial Hypercholesterolaemia Studies Collaboration (FHSC) registry between Oct 1, 2015, and Jan 31, 2021. Data in the registry were collected from 55 regional or national registries in 48 countries. Diagnoses relying on self-reported history of familial hypercholesterolaemia and suspected secondary hypercholesterolaemia were excluded from the registry; people with untreated LDL cholesterol (LDL-C) of at least 13·0 mmol/L were excluded from this study. Data were assessed overall and by WHO region, World Bank country income status, age, diagnostic criteria, and index-case status. The main outcome of this study was to assess current identification and management of children and adolescents with familial hypercholesterolaemia. Findings Of 63 093 individuals in the FHSC registry, 11 848 (18·8%) were children or adolescents younger than 18 years with HeFH and were included in this study; 5756 (50·2%) of 11 476 included individuals were female and 5720 (49·8%) were male. Sex data were missing for 372 (3·1%) of 11 848 individuals. Median age at registry entry was 9·6 years (IQR 5·8–13·2). 10 099 (89·9%) of 11 235 included individuals had a final genetically confirmed diagnosis of familial hypercholesterolaemia and 1136 (10·1%) had a clinical diagnosis. Genetically confirmed diagnosis data or clinical diagnosis data were missing for 613 (5·2%) of 11 848 individuals. Genetic diagnosis was more common in children and adolescents from high-income countries (9427 [92·4%] of 10 202) than in children and adolescents from non-high-income countries (199 [48·0%] of 415). 3414 (31·6%) of 10 804 children or adolescents were index cases. Familial-hypercholesterolaemia-related physical signs, cardiovascular risk factors, and cardiovascular disease were uncommon, but were more common in non-high-income countries. 7557 (72·4%) of 10 428 included children or adolescents were not taking lipid-lowering medication (LLM) and had a median LDL-C of 5·00 mmol/L (IQR 4·05–6·08). Compared with genetic diagnosis, the use of unadapted clinical criteria intended for use in adults and reliant on more extreme phenotypes could result in 50–75% of children and adolescents with familial hypercholesterolaemia not being identified. Interpretation Clinical characteristics observed in adults with familial hypercholesterolaemia are uncommon in children and adolescents with familial hypercholesterolaemia, hence detection in this age group relies on measurement of LDL-C and genetic confirmation. Where genetic testing is unavailable, increased availability and use of LDL-C measurements in the first few years of life could help reduce the current gap between prevalence and detection, enabling increased use of combination LLM to reach recommended LDL-C targets early in life. Funding Pfizer, Amgen, Merck Sharp & Dohme, Sanofi–Aventis, Daiichi Sankyo, and Regeneron

    Familial hypercholesterolaemia in children and adolescents from 48 countries: a cross-sectional study

    Get PDF
    Background: Approximately 450 000 children are born with familial hypercholesterolaemia worldwide every year, yet only 2·1% of adults with familial hypercholesterolaemia were diagnosed before age 18 years via current diagnostic approaches, which are derived from observations in adults. We aimed to characterise children and adolescents with heterozygous familial hypercholesterolaemia (HeFH) and understand current approaches to the identification and management of familial hypercholesterolaemia to inform future public health strategies. Methods: For this cross-sectional study, we assessed children and adolescents younger than 18 years with a clinical or genetic diagnosis of HeFH at the time of entry into the Familial Hypercholesterolaemia Studies Collaboration (FHSC) registry between Oct 1, 2015, and Jan 31, 2021. Data in the registry were collected from 55 regional or national registries in 48 countries. Diagnoses relying on self-reported history of familial hypercholesterolaemia and suspected secondary hypercholesterolaemia were excluded from the registry; people with untreated LDL cholesterol (LDL-C) of at least 13·0 mmol/L were excluded from this study. Data were assessed overall and by WHO region, World Bank country income status, age, diagnostic criteria, and index-case status. The main outcome of this study was to assess current identification and management of children and adolescents with familial hypercholesterolaemia. Findings: Of 63 093 individuals in the FHSC registry, 11 848 (18·8%) were children or adolescents younger than 18 years with HeFH and were included in this study; 5756 (50·2%) of 11 476 included individuals were female and 5720 (49·8%) were male. Sex data were missing for 372 (3·1%) of 11 848 individuals. Median age at registry entry was 9·6 years (IQR 5·8-13·2). 10 099 (89·9%) of 11 235 included individuals had a final genetically confirmed diagnosis of familial hypercholesterolaemia and 1136 (10·1%) had a clinical diagnosis. Genetically confirmed diagnosis data or clinical diagnosis data were missing for 613 (5·2%) of 11 848 individuals. Genetic diagnosis was more common in children and adolescents from high-income countries (9427 [92·4%] of 10 202) than in children and adolescents from non-high-income countries (199 [48·0%] of 415). 3414 (31·6%) of 10 804 children or adolescents were index cases. Familial-hypercholesterolaemia-related physical signs, cardiovascular risk factors, and cardiovascular disease were uncommon, but were more common in non-high-income countries. 7557 (72·4%) of 10 428 included children or adolescents were not taking lipid-lowering medication (LLM) and had a median LDL-C of 5·00 mmol/L (IQR 4·05-6·08). Compared with genetic diagnosis, the use of unadapted clinical criteria intended for use in adults and reliant on more extreme phenotypes could result in 50-75% of children and adolescents with familial hypercholesterolaemia not being identified. Interpretation: Clinical characteristics observed in adults with familial hypercholesterolaemia are uncommon in children and adolescents with familial hypercholesterolaemia, hence detection in this age group relies on measurement of LDL-C and genetic confirmation. Where genetic testing is unavailable, increased availability and use of LDL-C measurements in the first few years of life could help reduce the current gap between prevalence and detection, enabling increased use of combination LLM to reach recommended LDL-C targets early in life
    corecore